{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Week 03 Assignment Covid\n",
"\n",
"New types of data and new data science technologies enable new research. These new technologies are technologies such as the ability to combine existing data or the ability to generate synthetic data from existing knowledge. This week casus is based on such research. Data is generated by Synthea's COVID-19 module. The data was constructed using three peer-reviewed publications published in the early stages of the global pandemic, when less was known, along with emerging resources, data, publications, and clinical knowledge. The simulation outputs synthetic Electronic Health Records (EHR), including the daily consumption of Personal Protective Equipment (PPE) and other medical devices and supplies. The Data is stored in separate tables to avoid redundancy, with as a concequence that tables need to be combined and reorganized in dataframes for analysing purpose.\n",
"\n",
"Keywords: merge data, subset data, clean data, generate data\n",
"\n",
"You will learn about combining data with pandas and numpy and you will learn to visualize with bokeh. Concretely, you will preprocess the partly Synthetic Covid data in an appropiate format in order to conduct statistical and visual analysis. Learning objectives\n",
"\n",
"- Combine multiple data sources for analysis\n",
"- Read, inspect, clean, reshape data\n",
"- Visualize data using bokeh\n",
"- Maintain development environment \n",
"- Apply coding standards and FAIR principles\n",
"- Reshape the dataset into a format suitable for visual and statistical analysis\n",
"- Use widgets to make the plot interactive \n",
"- Use GIS libraries to plot geographical data\n",
"\n",
"Tutorials about combining data: https://github.com/fenna/BFVM22PROG1/blob/main/tutorials/tutorial_combine_data.ipynb\n",
"\n",
"study case combining data:https://github.com/fenna/BFVM22PROG1/blob/main/study_cases/adults_who_binge_drank_in_hot_towns.ipynb\n",
"\n",
"\n",
"Please add the topics you want to learn about here: https://padlet.com/ffeenstra1/kzh2chaqleq3iovu\n",
"\n",
"\n",
"Your job is to **visualize the lab values taken for COVID-19 patients of survived versus not survived patients**. \n",
"\n",
"The assignment consists of 6 parts:\n",
"\n",
"- [part 1: load the data](#0)\n",
" - [Exercise 1.1](#ex-11)\n",
"- [part 2: data wrangling](#1)\n",
" - [Exercise 2.1](#ex-21)\n",
"- [part 3: more wrangling](#2)\n",
" - [Exercise 3.1](#ex-31)\n",
"- [part 4: plot the data](#3)\n",
" - [Exercise 4.1](#ex-41)\n",
"- [part 5: plot patient location](#5)\n",
" - [Exercise 5.1](#ex-51)\n",
"\n",
"\n",
"Part 1 and 4 are mandatory, part 5 is optional (bonus)\n",
"Mind you that you cannot copy code without referencing the code. If you copy code you need to be able to explain your code verbally and you will not get the full score. \n",
"\n",
"\n",
"## About the data\n",
"\n",
"The data is generated by Synthea's COVID-19 module. The data was constructed using three peer-reviewed publications published in the early stages of the global pandemic, when less was known, along with emerging resources, data, publications, and clinical knowledge. The simulation outputs synthetic Electronic Health Records (EHR), including the daily consumption of Personal Protective Equipment (PPE) and other medical devices and supplies. For this assignment the `conditions`, `patients`, `observations`, `careplans` and `encounters` table will be used. The Data is stored in separate tables to avoid redundancy, with as a concequence that tables need to be combined and reorganized in dataframes for analysing purpose.\n",
"\n",
"Source: Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intelligence-Based Medicine. 2020 Nov;1:100007. https://doi.org/10.1016/j.ibmed.2020.100007\n",
"\n",
"Please download the data\n",
"\n",
"#### Covid Patients\n",
"Patients are considered Covid patients if they are identified with `CODE` `840539006`\n",
"\n",
"\n",
"#### Survivors\n",
"Patients that had covid and where tested negative after isolation have tested code `94531-1`, SARS-CoV-2 RNA Pnl Resp NAA+probe (covid-sars test) + a value of `Not detected (qualifier value)`. These patients are considered to be survived covid patients. \n",
"\n",
"#### Non-Survivors\n",
"Patients that did not survived Covid have a `DEATHDATE` which is not null. \n",
"\n",
"\n",
"#### Lab values COVID-19 patients\n",
"\n",
"Patients are monitored for blood and heart conditions once they are admitted in Hospital or under treatment. The lab values of interest are as follow: \n",
"\n",
"- `48065-7` Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma\n",
"- `26881-3` Interleukin 6 [Mass/volume] in Serum or Plasma\n",
"- `2276-4` Ferritin [Mass/volume] in Serum or Plasma\n",
"- `89579-7` Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method\n",
"- `731-0` Lymphocytes [#/volume] in Blood by Automated count\n",
"- `14804-9` Lactate dehydrogenase [Enzymatic activity/volume] in Serum or Plasma by Lactate to pyruvate reaction\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Part 1: Load the data (20 pt)\n",
"\n",
"Instructions: Load the data of the following files. \n",
"Preferably we read the data not with a hard coded data path but using a config file. See https://fennaf.gitbook.io/bfvm22prog1/data-processing/configuration-files/yaml\n",
"\n",
"- conditions.csv\n",
"- patients.csv\n",
"- observations.csv\n",
"- careplans.csv\n",
"- encounters.csv\n",
"\n",
"Get yourself familiar with the data. Create some meaningful overviews. Answer the following questions\n",
"\n",
"1. How many patients are there\n",
"2. How many covid-patients are there\n",
"3. How many patients do have a 'Hospital admission for isolation' encounter\n",
" \n",
"\n",
" Hints\n",
"
\n",
"
\n",
"\n",
" Hints\n",
"
\n",
"
\n",
"
\n",
"\n",
"\n",
"### 4.1 Code your solution"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"codes_dict = {'48065-7': 'Fibrin D-dimer FEU in Platelet poor plasma', \n",
" '26881-3': 'Interleukin 6 in Serum or Plasma',\n",
" '2276-4': 'Ferritin in Serum or Plasma', \n",
" '89579-7': 'Troponin I.cardiac in Serum or Plasma', \n",
" '731-0': 'Lymphocytes in Blood', \n",
" '14804-9': 'Lactate dehydrogenase in Serum or Plasma'}"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"